Manual flow cytometry analysis with Python¶
Table of contents¶
Why analyse flow cytometry data in Python?¶
With easy to use, graphical tools like FlowJo, FCSExpress and Cytobank, you might wonder why someone would want to manually analyse their flow cytometry data by writing Python code. It’s true that for most basic cytometry experiments it will be easier and faster to use one of these platforms, but there are benefits to using Python:
- It’s free and can be run on any computer without a license!
- You can create re-useable analysis scripts that allow others to reproduce your analyses
- Analysis scripts can be automatically run periodically to analyse new screening data in a folder
- Access to a broader range of computational, statistical and plotting tools
- Managing plugins (packages) is usually easier!
- It makes you look cool ;)
In this tutorial we're going to use the flowkit package as it's probably the most mature and fully featured, but others exist such as CytoPy and pytometry. The flowkit package uses the bokeh package for its interactive graphics, so we import the show method from its plotting module. So that these interactive plots render in this notebook, we also need to call bokeh.io.output_notebook(). Don't forget that before using these packages you will need to install them, for example using pip install bokeh flowkit.
import bokeh
from bokeh.plotting import show
import flowkit as fk
import numpy as np
import pandas as pd
bokeh.io.output_notebook()
Working with single .fcs files: the sample class¶
We’ve already seen data structures such as lists, dictionaries, NumPy arrays, and pandas DataFrames. The flowkit package provides a new data structure to contain data from an .fcs file: the sample.
Reading and exploring an .fcs file¶
To read an .fcs file into Python, we use Sample() function from flowkit. If we call the sample object we just created, we get a print out just telling us the version of the .fcs file (version 3.0 in this case), the filename. the number of parameters (or "channels" but this technically has a different meaning in flow and we should avoid calling parameters "channels". Anyway, there are 10 here), and the number of events.
sample = fk.Sample('data/raw/OpA_I2_C2_IM1_WB1_R1.fcs')
sample
Sample(v3.1, OpA_I2_C2_IM1_WB1_R1.fcs, 10 channels, 400000 events)
We can get some information abaout the parameters in the sample by looking at its channels attribute. This is displayed as a DataFrame showing us the short parameter name (pnn), the optional parameter label (pns), the amplifier gain (png; 1.0 = no gain), the lin/log amplification type (pne; required but rarely used now), and the upper value or range (pnr).
sample.channels
| channel_number | pnn | pns | png | pne | pnr | |
|---|---|---|---|---|---|---|
| 0 | 1 | FSC-A | 1.0 | (0.0, 0.0) | 262144.0 | |
| 1 | 2 | FSC-H | 1.0 | (0.0, 0.0) | 262144.0 | |
| 2 | 3 | SSC-A | 1.0 | (0.0, 0.0) | 262144.0 | |
| 3 | 4 | FITC-A | CD4 | 1.0 | (0.0, 0.0) | 262144.0 |
| 4 | 5 | PE-A | CCR7 | 1.0 | (0.0, 0.0) | 262144.0 |
| 5 | 6 | PerCP-Cy5-5-A | CD8 | 1.0 | (0.0, 0.0) | 262144.0 |
| 6 | 7 | PE-Cy7-A | CD3 | 1.0 | (0.0, 0.0) | 262144.0 |
| 7 | 8 | APC-A | CD45RA | 1.0 | (0.0, 0.0) | 262144.0 |
| 8 | 9 | APC-Cy7-A | CD45 | 1.0 | (0.0, 0.0) | 262144.0 |
| 9 | 10 | Time | 1.0 | (0.0, 0.0) | 262144.0 |
We can get lists of the PnN and PnS labels by getting the pnn_labels and pns_labels attributes, respectively.
sample.pnn_labels
['FSC-A', 'FSC-H', 'SSC-A', 'FITC-A', 'PE-A', 'PerCP-Cy5-5-A', 'PE-Cy7-A', 'APC-A', 'APC-Cy7-A', 'Time']
sample.pns_labels
['', '', '', 'CD4', 'CCR7', 'CD8', 'CD3', 'CD45RA', 'CD45', '']
We can extract the raw, event-level data either as a NumPy array or pandas DataFrame using the get_events() and as_dataframe() methods, respectively. The argument source allows us to specify the kind of data we want and can take the following values:
'raw', returns uncompensated, untransforme data'comp', returns compensated, untransformed data'xform', returns transformed data (compensated if compensation was performed)
We will only have access to compensated and/or transformed values once we have performed these steps later.
sample.get_events(source = 'raw')
array([[1.51025281e+05, 1.16233000e+05, 1.92647828e+05, ...,
7.40429993e+02, 6.16455029e+03, 1.00000005e-03],
[1.16219039e+05, 9.49560000e+04, 6.08364609e+04, ...,
2.73942017e+03, 3.36955508e+04, 2.00000009e-03],
[1.45838562e+05, 1.11304000e+05, 1.97575422e+05, ...,
2.71890015e+02, 2.48805005e+03, 3.00000003e-03],
...,
[1.48042719e+05, 1.07230000e+05, 1.53985562e+05, ...,
5.19840027e+02, 1.42956006e+03, 2.00011002e+02],
[1.40678719e+05, 1.03240000e+05, 2.42794891e+05, ...,
9.09720032e+02, 1.33209004e+04, 2.00011002e+02],
[2.01164328e+05, 1.45805000e+05, 2.46083188e+05, ...,
4.25790009e+02, 3.51405005e+03, 2.00011002e+02]])
sample.as_dataframe(source = 'raw')
| pnn | FSC-A | FSC-H | SSC-A | FITC-A | PE-A | PerCP-Cy5-5-A | PE-Cy7-A | APC-A | APC-Cy7-A | Time |
|---|---|---|---|---|---|---|---|---|---|---|
| pns | CD4 | CCR7 | CD8 | CD3 | CD45RA | CD45 | ||||
| 0 | 151025.281250 | 116233.0 | 192647.828125 | 351.140015 | 873.000000 | 172.660004 | 964.180054 | 740.429993 | 6164.550293 | 0.001000 |
| 1 | 116219.039062 | 94956.0 | 60836.460938 | 8130.540039 | 2537.520020 | 535.440002 | 28739.160156 | 2739.420166 | 33695.550781 | 0.002000 |
| 2 | 145838.562500 | 111304.0 | 197575.421875 | 353.080017 | 1519.020020 | 89.240005 | 863.300049 | 271.890015 | 2488.050049 | 0.003000 |
| 3 | 145582.078125 | 111470.0 | 203096.671875 | 492.760010 | 5422.300293 | 356.960022 | 2227.120117 | 579.690002 | 2491.469971 | 0.003000 |
| 4 | 85385.437500 | 69691.0 | 46193.339844 | 5377.680176 | 3166.080078 | 816.740051 | 43981.742188 | 2547.900146 | 25853.490234 | 0.005000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 399995 | 140499.515625 | 112582.0 | 221049.421875 | 100.880005 | 1140.720093 | 194.000000 | 1994.320068 | 478.800018 | 8818.469727 | 200.009003 |
| 399996 | 158901.125000 | 121242.0 | 128474.562500 | 5032.360352 | 1689.739990 | 31.040001 | 1313.380005 | 1451.790039 | 14246.010742 | 200.009995 |
| 399997 | 148042.718750 | 107230.0 | 153985.562500 | 413.220001 | 995.220032 | 182.360001 | 234.740005 | 519.840027 | 1429.560059 | 200.011002 |
| 399998 | 140678.718750 | 103240.0 | 242794.890625 | 164.900009 | 564.540039 | 190.120010 | 830.320007 | 909.720032 | 13320.900391 | 200.011002 |
| 399999 | 201164.328125 | 145805.0 | 246083.187500 | 593.640015 | 735.260010 | 161.020004 | 172.660004 | 425.790009 | 3514.050049 | 200.011002 |
400000 rows × 10 columns
To extract a dictionary of FCS keywords, we use the get_metadata() method, further subsetting this using the key of the one we want. Below we can see this data was acquired on a FACSCanto instrument.
sample.get_metadata()
{'beginanalysis': '0',
'begindata': '2698',
'beginstext': '0',
'byteord': '1,2,3,4',
'datatype': 'F',
'endanalysis': '0',
'enddata': '16002697',
'endstext': '0',
'mode': 'L',
'nextdata': '0',
'par': '10',
'tot': '400000',
'p1b': '32',
'p1e': '0,0',
'p1g': '1.0',
'p1r': '262144',
'p1n': 'FSC-A',
'p2b': '32',
'p2e': '0,0',
'p2g': '1.0',
'p2r': '262144',
'p2n': 'FSC-H',
'p3b': '32',
'p3e': '0,0',
'p3g': '1.0',
'p3r': '262144',
'p3n': 'SSC-A',
'p4b': '32',
'p4e': '0,0',
'p4g': '1.0',
'p4r': '262144',
'p4n': 'FITC-A',
'p4s': 'CD4',
'p5b': '32',
'p5e': '0,0',
'p5g': '1.0',
'p5r': '262144',
'p5n': 'PE-A',
'p5s': 'CCR7',
'p6b': '32',
'p6e': '0,0',
'p6g': '1.0',
'p6r': '262144',
'p6n': 'PerCP-Cy5-5-A',
'p6s': 'CD8',
'p7b': '32',
'p7e': '0,0',
'p7g': '1.0',
'p7r': '262144',
'p7n': 'PE-Cy7-A',
'p7s': 'CD3',
'p8b': '32',
'p8e': '0,0',
'p8g': '1.0',
'p8r': '262144',
'p8n': 'APC-A',
'p8s': 'CD45RA',
'p9b': '32',
'p9e': '0,0',
'p9g': '1.0',
'p9r': '262144',
'p9n': 'APC-Cy7-A',
'p9s': 'CD45',
'p10b': '32',
'p10e': '0,0',
'p10g': '1.0',
'p10r': '262144',
'p10n': 'Time',
'fil': 'OpA_I2_C2_IM1_WB1_R1.fcs',
'sys': 'Windows XP 5.1',
'src': ' ',
'date': '19-APR-2018',
'btim': '13:20:39',
'etim': '13:24:02',
'cyt': 'FACSCanto',
'op': ' ',
'inst': ' ',
'p1v': '97',
'p2v': '97',
'p3v': '423',
'p4v': '483',
'p5v': '620',
'p6v': '544',
'p7v': '754',
'p8v': '582',
'p9v': '706',
'creator': 'BD FACSDiva Software Version 6.1.3',
'tube name': ' ',
'experiment name': ' ',
'guid': '2ab5d204-673d-4b77-a251-591efc88030c',
'cytnum': '1',
'window extension': '7.00',
'export user name': ' ',
'export time': '14-MAY-2018-13:48:28',
'fsc asf': '1.12',
'autobs': 'TRUE',
'laser1name': 'Blue',
'laser1delay': '0.00',
'laser1asf': '1.94',
'laser2name': 'Red',
'laser2delay': '28.48',
'laser2asf': '1.71',
'spill': '6,FITC-A,PE-A,PerCP-Cy5-5-A,PE-Cy7-A,APC-A,APC-Cy7-A,1,0.20000000316200833,0.010000259790764056,0,0,0,0.004999996329852424,1,0.037000002693247325,0.0040000050732601836,0,0,0.0020290974735818383,0.03500123911686522,1,1.3499999934932951,0.02000015592539009,0.09999999876097611,0.0005655546867790353,0.012001239604607583,0.013000054959324544,1,0.00042784745930640527,0.023518117238373906,0,0,0.0032115590263317095,0,1,0.06999999330644507,0,0.0005826769847731325,0.002967216706987719,0.050000015483425794,0.06500000567621908,1',
'apply compensation': 'TRUE',
'threshold': 'FSC,20000',
'p1display': 'LIN',
'p1bs': '0',
'p1ms': '0',
'p2display': 'LIN',
'p2bs': '0',
'p2ms': '0',
'p3display': 'LIN',
'p3bs': '0',
'p3ms': '0',
'p4display': 'LOG',
'p4bs': '68',
'p4ms': '0',
'p5display': 'LOG',
'p5bs': '144',
'p5ms': '0',
'p6display': 'LOG',
'p6bs': '185',
'p6ms': '0',
'p7display': 'LOG',
'p7bs': '1230',
'p7ms': '0',
'p8display': 'LOG',
'p8bs': '399',
'p8ms': '0',
'p9display': 'LOG',
'p9bs': '658',
'p9ms': '0',
'p10bs': '0',
'p10ms': '0',
'sample id': ' ',
'patient id': ' ',
'case number': ' ',
'cst setup status': 'SUCCESS',
'cst beads lot id': '41720',
'cytometer config name': ' ',
'cytometer config create date': '11/03/2016 01:04:40 PM',
'cst setup date': '12/06/2016 02:33:54 PM',
'cst baseline date': '11/03/2016 01:20:27 PM'}
sample.get_metadata()['cyt']
'FACSCanto'
Compensating a sample¶
If your panel of fluorophores exhibit spectral overlap with each other, it’s important to compensate your data using single colour controls. In this section we’re going to compensate our data using the compensation matrix calculated by the instrument during acquisition. This compensation matrix is saved as a keyword in each .fcs file from the experiment. We could also import a compensation matrix from FlowJo.
For this .fcs file, the spillover matrix is stored in the 'spill' keyword, but different instrument vendors may use 'spillover' or '$SPILLOVER'. Once we extract the spillover matrix (as a string actually), we use the apply_compensation() method to compensate the data. The sample now contains both raw and comp data and we can access either. We can also modify the compensation matric if we wish, but we won't do that in this tutorial.
comp_mat = sample.metadata['spill']
sample.apply_compensation(comp_mat)
sample.compensation.as_dataframe()
| FITC-A | PE-A | PerCP-Cy5-5-A | PE-Cy7-A | APC-A | APC-Cy7-A | |
|---|---|---|---|---|---|---|
| FITC-A | 1.000000 | 0.200000 | 0.010000 | 0.000 | 0.000000 | 0.000000 |
| PE-A | 0.005000 | 1.000000 | 0.037000 | 0.004 | 0.000000 | 0.000000 |
| PerCP-Cy5-5-A | 0.002029 | 0.035001 | 1.000000 | 1.350 | 0.020000 | 0.100000 |
| PE-Cy7-A | 0.000566 | 0.012001 | 0.013000 | 1.000 | 0.000428 | 0.023518 |
| APC-A | 0.000000 | 0.000000 | 0.003212 | 0.000 | 1.000000 | 0.070000 |
| APC-Cy7-A | 0.000000 | 0.000583 | 0.002967 | 0.050 | 0.065000 | 1.000000 |
Transforming parameter values¶
Most modern flow cytometers export linear data for all parameters. We usually need to apply some transformation to the fluorescence parameters to visualise the distribution of the data across orders of magnitude.
There are different transformations we can apply. For mass cytometry data the asinh transform is commonly used. For fluorescence cytometry data, the logicle transformation is a logical choice ;). flowkit has functions for applying asinh and logicle transformations, but here we're going to apply the biexponential transformation, with the same parameters as those found in FlowJo. We first define the transformation using fk.transforms.WSPBiexTransform(), giving our transformation a name and setting its parameters. I've stuck with the default arguments here, but width and negative are the main ones you might wish to play with (width mainly).
We then use the apply_transform() method to transform our fluorescence values. This method automatically identifies fluorescent parameters, though we could supply a dictionary of parameter:transform pairs to a) specify which parameters we wanted to transform, and b) apply different transforms per parameter.
biex_xform = fk.transforms.WSPBiexTransform(
'biex',
max_value = 262144,
positive = 4.42,
width = -10,
negative = 0
)
sample.apply_transform(biex_xform)
Let's remind ourselves of the parameters, the labels and numbers.
sample.channels
| channel_number | pnn | pns | png | pne | pnr | |
|---|---|---|---|---|---|---|
| 0 | 1 | FSC-A | 1.0 | (0.0, 0.0) | 262144.0 | |
| 1 | 2 | FSC-H | 1.0 | (0.0, 0.0) | 262144.0 | |
| 2 | 3 | SSC-A | 1.0 | (0.0, 0.0) | 262144.0 | |
| 3 | 4 | FITC-A | CD4 | 1.0 | (0.0, 0.0) | 262144.0 |
| 4 | 5 | PE-A | CCR7 | 1.0 | (0.0, 0.0) | 262144.0 |
| 5 | 6 | PerCP-Cy5-5-A | CD8 | 1.0 | (0.0, 0.0) | 262144.0 |
| 6 | 7 | PE-Cy7-A | CD3 | 1.0 | (0.0, 0.0) | 262144.0 |
| 7 | 8 | APC-A | CD45RA | 1.0 | (0.0, 0.0) | 262144.0 |
| 8 | 9 | APC-Cy7-A | CD45 | 1.0 | (0.0, 0.0) | 262144.0 |
| 9 | 10 | Time | 1.0 | (0.0, 0.0) | 262144.0 |
We can plot two parameters against each other to visualize the effectiveness of the compensation and transformation using the plot_scatter() method (slightly confusingly this refers to a scatter plot, not a plot of the light-scatter parameters), we can either provide the "channel_number" or the "pnn" from the table above to set the x and y axes. Setting source = 'xform' ensures we are visualizing the compensated, transformed data. By default, a subsample of the events (determined when the sample is loaded) is plotted, but you could plot everything by setting subsample = False. Once we've created the plot, we need to visualise it by calling show(). Play around with the width argument of the fk.transforms.WSPBiexTransform() above and see how this impacts the plotted data.
p = sample.plot_scatter(4, 6, source = 'xform', subsample = True)
show(p)
p = sample.plot_scatter('FITC-A', 'PerCP-Cy5-5-A', source = 'xform', subsample = True)
show(p)
It's a good idea to visualise all the parameters to ensure the transformation we applied is suitable for all (and to identify any potential compensation errors). One quick way to do this is using the plot_scatter_matrix() method, which will plot all combinations of parameters we provide as a list.
We can use the export_png() function from bokeh's io module to save the image as a file. Note that if you get errors about needing to install "selenium" or a different browser, you may have to run the following in your terminal/powershell: pip install selenium and conda install -c conda-forge firefox geckodriver (these are only needed for saving the plots as images).
p = sample.plot_scatter_matrix([4, 5, 6, 7, 8, 9])
show(p)
bokeh.io.export_png(p, filename = 'plots/scatter matrix.png')
'c:\\Users\\u061745\\OneDrive - UCB\\Documents\\Projects\\Python-for-cytometry-course\\plots\\scatter matrix.png'
The method above quickly becomes unsuitable with even a modest number of parameters, so below we do this one by one against a fixed y axis of our choice.
for param in [4, 5, 6, 7, 8]:
p = sample.plot_scatter(param, 9, source = 'xform', subsample = True)
show(p)
bokeh.io.export_png(
p,
filename = 'plots/' + sample.pnn_labels[param - 1] + '.png'
)
Looks like we have some events buried below the axis on some of our parameters. We'll continue for now but I'll leave it as an exercise for you to play around with the transformation.
The flowkit package uses bokeh for its plotting, but there is nothing stopping you from extracting the data in DataFrame format and using seaborn like we did in the previous tutorial.
Manual gating¶
Just because we’re typing, doesn’t mean we can’t manually gate our data like we do in FlowJo. We can create rectangle, polygon, ellipsoid, and quadrant gates in Python just like in any other analysis software. To do this, we’re going to create a Session object which will hold the information about our gates and samples.
Once we've instantiated a session with Session() we can add all of the files in a directory to it using add-Samples() and providing the path to the files (we could also give a list of specific filepaths if we wanted). We can compensate and transform all the data using add_comp_matrix() and add_transform(), respectively, applying the compensation matrix and transformation we defined earlier. Printing the session object just tells us we have 9 samples, and we can get their ids with get_sample_ids().
session = fk.Session()
session.add_samples('data/raw/')
session.add_comp_matrix(sample.compensation)
session.add_transform(biex_xform)
print(session)
session.get_sample_ids()
Session(9 samples)
['OpA_I2_C2_IM1_WB1_R1.fcs', 'OpA_I2_C2_IM1_WB1_R2.fcs', 'OpA_I2_C2_IM1_WB1_R3.fcs', 'OpA_I2_C2_IM1_WB2_R1.fcs', 'OpA_I2_C2_IM1_WB2_R2.fcs', 'OpA_I2_C2_IM1_WB2_R3.fcs', 'OpA_I2_C2_IM1_WB3_R1.fcs', 'OpA_I2_C2_IM1_WB3_R2.fcs', 'OpA_I2_C2_IM1_WB3_R3.fcs']
Rectangle gates¶
We'll start by gating on that lymphocyte population by FSC-A and SSC-A. I wouldn't normally use a rectangle gate here but it's a good place to start. Let's plot the scatter parameters so we can get an idea of where we want our gate to go.
p = sample.plot_scatter('FSC-A', 'SSC-A')
show(p)
To create our rectangle gate, we need to define a Dimension object for each parameter of our gate (i.e. 2). We do this with the Dimension() function, specifying the parameter and it's minimum and maximum values (if range_min or range_max are not given then minimum or maximum is unbounded, respectively.) We then create a RectangleGate object using the function of the same name, giving the name of the gate and a list of our Dimension objects.
We then use the add_gate() method, giving it the RectangleGate object and the name of the parent population (we use 'root', to indicate this gate is at the top of the hierarchy). To actually gate the data and assign each event to be in or our of the gate, we use the analyze_samples() method, and finally use get_analysis_report() to get a nice tabular summary of the results.
fsc_a = fk.Dimension('FSC-A', range_min = 5e4, range_max = 1.2e5)
ssc_a = fk.Dimension('SSC-A', range_min = 1e4, range_max = 8e4)
scatter_gate = fk.gates.RectangleGate(
'scatter',
dimensions = [fsc_a, ssc_a]
)
session.add_gate(scatter_gate, gate_path = ('root',))
session.analyze_samples(verbose = True)
session.get_analysis_report()
#### Processing gates for 9 samples (multiprocessing is enabled - 9 cpus) ####
| sample | gate_path | gate_name | gate_type | quadrant_parent | parent | count | absolute_percent | relative_percent | level | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OpA_I2_C2_IM1_WB1_R1.fcs | (root,) | scatter | RectangleGate | None | root | 127122 | 31.78050 | 31.78050 | 1 |
| 1 | OpA_I2_C2_IM1_WB1_R2.fcs | (root,) | scatter | RectangleGate | None | root | 132175 | 33.04375 | 33.04375 | 1 |
| 2 | OpA_I2_C2_IM1_WB1_R3.fcs | (root,) | scatter | RectangleGate | None | root | 133872 | 33.46800 | 33.46800 | 1 |
| 3 | OpA_I2_C2_IM1_WB2_R1.fcs | (root,) | scatter | RectangleGate | None | root | 116529 | 29.13225 | 29.13225 | 1 |
| 4 | OpA_I2_C2_IM1_WB2_R2.fcs | (root,) | scatter | RectangleGate | None | root | 118876 | 29.71900 | 29.71900 | 1 |
| 5 | OpA_I2_C2_IM1_WB2_R3.fcs | (root,) | scatter | RectangleGate | None | root | 119713 | 29.92825 | 29.92825 | 1 |
| 6 | OpA_I2_C2_IM1_WB3_R1.fcs | (root,) | scatter | RectangleGate | None | root | 176388 | 44.09700 | 44.09700 | 1 |
| 7 | OpA_I2_C2_IM1_WB3_R2.fcs | (root,) | scatter | RectangleGate | None | root | 173472 | 43.36800 | 43.36800 | 1 |
| 8 | OpA_I2_C2_IM1_WB3_R3.fcs | (root,) | scatter | RectangleGate | None | root | 170989 | 42.74725 | 42.74725 | 1 |
It's a good idea to visualise our gate to make sure we're happy with its placement. Here, we isolate just a single test case to plot.
case = session.get_sample_ids()[0]
p = session.plot_gate(case, 'scatter')
show(p)
That looks like it captures the events we want it to. If we did want to make a change, we'll get an error if we try to define a gate that already exists, and we must first remove it with g_strat.remove_gate(<name of population>).
Polygon gates¶
Let’s use a polygon gate to exclude doublets. First, let's plot FSC-A against FSC-H using the session class's plot_scatter() method. To this we give the name of the sample we wish to plot, the x and x parameters in the form of Dimension objects (this is a bit annoying but it is what it is), and the name of the population we wish to plot (we could omit this argument to plot everything, give it a go).
fsc_a = fk.Dimension('FSC-A')
fsc_h = fk.Dimension('FSC-H')
p = session.plot_scatter(
case,
x_dim = fsc_a,
y_dim = fsc_h,
gate_name = 'scatter'
)
show(p)
To create a polygon gate in Python, you need to create a list whose elements are lists of x y coordinate pairs for each of the vertices of your polygon. In the example below, we're creating a polygon gate with four vertices where the first and second value in each pair are the FSC-A and FSC-H values, respectively. I then define a Dimension object for FSC-H (we'll reuse the FSC-A one from earlier), then use the PolygonGate() function to create the gate. We then add this gate to the GatingStrategy as before and generate the report.
vertices = [
[50000, 38000],
[100000, 80000],
[100000, 85000],
[50000, 45000]
]
singlet_gate = fk.gates.PolygonGate(
'singlets',
dimensions = [fsc_a, fsc_h],
vertices = vertices
)
session.add_gate(singlet_gate, gate_path = ('root', 'scatter'))
session.analyze_samples(verbose = True)
session.get_analysis_report()
#### Processing gates for 9 samples (multiprocessing is enabled - 9 cpus) ####
| sample | gate_path | gate_name | gate_type | quadrant_parent | parent | count | absolute_percent | relative_percent | level | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OpA_I2_C2_IM1_WB1_R1.fcs | (root,) | scatter | RectangleGate | None | root | 127122 | 31.78050 | 31.780500 | 1 |
| 1 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 115728 | 28.93200 | 91.036957 | 2 |
| 2 | OpA_I2_C2_IM1_WB1_R2.fcs | (root,) | scatter | RectangleGate | None | root | 132175 | 33.04375 | 33.043750 | 1 |
| 3 | OpA_I2_C2_IM1_WB1_R2.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 120130 | 30.03250 | 90.887082 | 2 |
| 4 | OpA_I2_C2_IM1_WB1_R3.fcs | (root,) | scatter | RectangleGate | None | root | 133872 | 33.46800 | 33.468000 | 1 |
| 5 | OpA_I2_C2_IM1_WB1_R3.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 120928 | 30.23200 | 90.331063 | 2 |
| 6 | OpA_I2_C2_IM1_WB2_R1.fcs | (root,) | scatter | RectangleGate | None | root | 116529 | 29.13225 | 29.132250 | 1 |
| 7 | OpA_I2_C2_IM1_WB2_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 103375 | 25.84375 | 88.711823 | 2 |
| 8 | OpA_I2_C2_IM1_WB2_R2.fcs | (root,) | scatter | RectangleGate | None | root | 118876 | 29.71900 | 29.719000 | 1 |
| 9 | OpA_I2_C2_IM1_WB2_R2.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 106191 | 26.54775 | 89.329217 | 2 |
| 10 | OpA_I2_C2_IM1_WB2_R3.fcs | (root,) | scatter | RectangleGate | None | root | 119713 | 29.92825 | 29.928250 | 1 |
| 11 | OpA_I2_C2_IM1_WB2_R3.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 105490 | 26.37250 | 88.119085 | 2 |
| 12 | OpA_I2_C2_IM1_WB3_R1.fcs | (root,) | scatter | RectangleGate | None | root | 176388 | 44.09700 | 44.097000 | 1 |
| 13 | OpA_I2_C2_IM1_WB3_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 158237 | 39.55925 | 89.709617 | 2 |
| 14 | OpA_I2_C2_IM1_WB3_R2.fcs | (root,) | scatter | RectangleGate | None | root | 173472 | 43.36800 | 43.368000 | 1 |
| 15 | OpA_I2_C2_IM1_WB3_R2.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 153580 | 38.39500 | 88.533020 | 2 |
| 16 | OpA_I2_C2_IM1_WB3_R3.fcs | (root,) | scatter | RectangleGate | None | root | 170989 | 42.74725 | 42.747250 | 1 |
| 17 | OpA_I2_C2_IM1_WB3_R3.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 140998 | 35.24950 | 82.460275 | 2 |
p = session.plot_gate(case, 'singlets')
show(p)
At this point, if we wanted to vsiaulise the hierarchy we're creating, we can do so with the get_gate_hierarchy() method.
print(session.get_gate_hierarchy())
root
╰── scatter
╰── singlets
Ellipsoid gates¶
Next, let’s gate on our lymphocyte population using an ellipsoid gate around the CD45+ CD3+ events. Again, let’s plot the parameters and highlight the population we’re interested in to help us draw the gate.
cd45 = fk.Dimension('APC-Cy7-A')
cd3 = fk.Dimension('PE-Cy7-A')
p = session.plot_scatter(
case,
x_dim = cd45,
y_dim = cd3,
gate_name = 'singlets'
)
show(p)
Hmm, that doesn't look right, what's happened? Well as we're now working with fluorescence parameters instead of scatter parameters, we need to specifically ask for compensated and transformed data, and we do this inside the definition of each Dimension object. This might seem clunky, but it's because a Session object can contain multiple compensation matrices and transformations, especially if loaded in from a FlowJo workspace file. We can get the name of the compensation matric using the get_comp_matrices() method (noting there is only one here called 'custom_spill'), and we know the name of the biexponential transformation we defined earlier ('biex').
session.get_comp_matrices()
[Matrix(custom_spill, dims: 6)]
We now modify the above code to get the compensated, transformed data.
cd45 = fk.Dimension(
'APC-Cy7-A',
compensation_ref = 'custom_spill',
transformation_ref = 'biex'
)
cd3 = fk.Dimension(
'PE-Cy7-A',
compensation_ref = 'custom_spill',
transformation_ref = 'biex'
)
p = session.plot_scatter(
case,
x_dim = cd45,
y_dim = cd3,
gate_name = 'singlets'
)
show(p)
Creating ellipsoid gates can also be a little cumbersome in Python. Because an ellipse can be diagonal, we need to define a covariance matrix and a list of means, in order to create an ellipsoid gate.
NOTE: Covariance is just unstandardized correlation. If you want to create an ellipse that has a correlation between the two parameters (i.e. it is tilted or diagonal), you need to add some covariance. If you want to create an ellipse that shows no correlation between two parameters, leave the covariances 0 and just set the variance for each parameter individually (this sets the width of the ellipse in each axis).
In this example, we want an ellipse to encircle the CD45+ CD3+ events. As this population is not diagonal, we can set the covariance values in our matrix to 0, and just set the variances to control how wide the ellipse is around its mean. In the example below, I represent the covariance matrix as a NumPy array where the diagonal elements are the variances and the off-diagonals are the covariance.
We then use the EllipsoidGate() method to create the gate, giving the name of the population, the dimensions (remember that these Dimension definitions must include the relevant compensation and transformations), the covariance matrix, and the distance_square() argument, which is just a multiplier for the variance and can be used to shrink or expand the gate.
center = [3150, 3500]
cov = np.array(
[[300 ** 2, 0],
[0, 300 ** 2]]
)
print(cov)
cd3_gate = fk.gates.EllipsoidGate(
'cd3',
dimensions = [cd45, cd3],
coordinates = center,
covariance_matrix = cov,
distance_square = 1
)
session.add_gate(cd3_gate, gate_path = ('root', 'scatter', 'singlets'))
session.analyze_samples()
session.get_analysis_report()
[[90000 0] [ 0 90000]]
| sample | gate_path | gate_name | gate_type | quadrant_parent | parent | count | absolute_percent | relative_percent | level | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OpA_I2_C2_IM1_WB1_R1.fcs | (root,) | scatter | RectangleGate | None | root | 127122 | 31.78050 | 31.780500 | 1 |
| 1 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 115728 | 28.93200 | 91.036957 | 2 |
| 2 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 61324 | 15.33100 | 52.989769 | 3 |
| 3 | OpA_I2_C2_IM1_WB1_R2.fcs | (root,) | scatter | RectangleGate | None | root | 132175 | 33.04375 | 33.043750 | 1 |
| 4 | OpA_I2_C2_IM1_WB1_R2.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 120130 | 30.03250 | 90.887082 | 2 |
| 5 | OpA_I2_C2_IM1_WB1_R2.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 62621 | 15.65525 | 52.127695 | 3 |
| 6 | OpA_I2_C2_IM1_WB1_R3.fcs | (root,) | scatter | RectangleGate | None | root | 133872 | 33.46800 | 33.468000 | 1 |
| 7 | OpA_I2_C2_IM1_WB1_R3.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 120928 | 30.23200 | 90.331063 | 2 |
| 8 | OpA_I2_C2_IM1_WB1_R3.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 62844 | 15.71100 | 51.968113 | 3 |
| 9 | OpA_I2_C2_IM1_WB2_R1.fcs | (root,) | scatter | RectangleGate | None | root | 116529 | 29.13225 | 29.132250 | 1 |
| 10 | OpA_I2_C2_IM1_WB2_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 103375 | 25.84375 | 88.711823 | 2 |
| 11 | OpA_I2_C2_IM1_WB2_R1.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 58972 | 14.74300 | 57.046675 | 3 |
| 12 | OpA_I2_C2_IM1_WB2_R2.fcs | (root,) | scatter | RectangleGate | None | root | 118876 | 29.71900 | 29.719000 | 1 |
| 13 | OpA_I2_C2_IM1_WB2_R2.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 106191 | 26.54775 | 89.329217 | 2 |
| 14 | OpA_I2_C2_IM1_WB2_R2.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 60170 | 15.04250 | 56.662052 | 3 |
| 15 | OpA_I2_C2_IM1_WB2_R3.fcs | (root,) | scatter | RectangleGate | None | root | 119713 | 29.92825 | 29.928250 | 1 |
| 16 | OpA_I2_C2_IM1_WB2_R3.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 105490 | 26.37250 | 88.119085 | 2 |
| 17 | OpA_I2_C2_IM1_WB2_R3.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 60300 | 15.07500 | 57.161816 | 3 |
| 18 | OpA_I2_C2_IM1_WB3_R1.fcs | (root,) | scatter | RectangleGate | None | root | 176388 | 44.09700 | 44.097000 | 1 |
| 19 | OpA_I2_C2_IM1_WB3_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 158237 | 39.55925 | 89.709617 | 2 |
| 20 | OpA_I2_C2_IM1_WB3_R1.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 73853 | 18.46325 | 46.672396 | 3 |
| 21 | OpA_I2_C2_IM1_WB3_R2.fcs | (root,) | scatter | RectangleGate | None | root | 173472 | 43.36800 | 43.368000 | 1 |
| 22 | OpA_I2_C2_IM1_WB3_R2.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 153580 | 38.39500 | 88.533020 | 2 |
| 23 | OpA_I2_C2_IM1_WB3_R2.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 70976 | 17.74400 | 46.214351 | 3 |
| 24 | OpA_I2_C2_IM1_WB3_R3.fcs | (root,) | scatter | RectangleGate | None | root | 170989 | 42.74725 | 42.747250 | 1 |
| 25 | OpA_I2_C2_IM1_WB3_R3.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 140998 | 35.24950 | 82.460275 | 2 |
| 26 | OpA_I2_C2_IM1_WB3_R3.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 39169 | 9.79225 | 27.779827 | 3 |
From this point, it may be easier to view the table of results one sample at a time, which we can do by looking at the report attribute of the object returned by get_gating_results().
gating_results = session.get_gating_results(case)
gating_results.report
| sample | gate_path | gate_name | gate_type | quadrant_parent | parent | count | absolute_percent | relative_percent | level | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OpA_I2_C2_IM1_WB1_R1.fcs | (root,) | scatter | RectangleGate | None | root | 127122 | 31.7805 | 31.780500 | 1 |
| 1 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 115728 | 28.9320 | 91.036957 | 2 |
| 2 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 61324 | 15.3310 | 52.989769 | 3 |
Now we can visualise the gate as before.
p = session.plot_gate(case, 'cd3')
show(p)
Quadrant gates¶
Because Flowkit is compliant to the Gating-ML framework, creating quadrant gates is rather cumbersome. But as this is a common tool used for gating, we give an example of how to do it below. Let's create a plot of CD4 vs CD8 to help us define the quadrants we want to draw. As usual, we define Dimension objects for each marker, ensuring we use the compensated and transformed data.
cd4 = fk.Dimension(
'FITC-A',
compensation_ref = 'custom_spill',
transformation_ref = 'biex'
)
cd8 = fk.Dimension(
'PerCP-Cy5-5-A',
compensation_ref = 'custom_spill',
transformation_ref = 'biex'
)
p = session.plot_scatter(
case,
x_dim = cd4,
y_dim = cd8,
gate_name = 'cd3'
)
show(p)
We would like to split the regions of thie plot into 4 quadrants such that we separately gate the CD4s, CD8s (and double negatives and positives). We first need to define two QuadrantDividers, which are objects that define the split in each axis we'd like the quadrants to be divided at. We then just store these together in a list called quad_divs.
quad_div1 = fk.QuadrantDivider(
'cd4-div',
'FITC-A',
compensation_ref = 'custom_spill',
transformation_ref = 'biex',
values = [1700]
)
quad_div2 = fk.QuadrantDivider(
'cd8-div',
'PerCP-Cy5-5-A',
compensation_ref = 'custom_spill',
transformation_ref = 'biex',
values = [2000]
)
quad_divs = [quad_div1, quad_div2]
Next, we define the specific quadrants that will result from splitting up the space. We give each population a name using the quadrant_id argument, a list of the names of our divider_refs, and a list indicating the divider_ranges (where None can be taken to mean +/- infinity). For example, quad_1 below defines a population called 'cd4/8_double_pos in the quadrant with a CD4 intensity > 1700 and a CD8 intensity > 2000. In contrast, quad_2 defines a population called cd4 in the quadrant with a CD4 intensity > 1700 and a CD8 intensity <= 2000. We then just store these together in a list called quadrants.
quad_1 = fk.gates.Quadrant(
quadrant_id = 'cd4/8_double_pos',
divider_refs = ['cd4-div', 'cd8-div'],
divider_ranges = [(1700, None), (2000, None)]
)
quad_2 = fk.gates.Quadrant(
quadrant_id = 'cd4',
divider_refs = ['cd4-div', 'cd8-div'],
divider_ranges = [(1700, None), (None, 2000)]
)
quad_3 = fk.gates.Quadrant(
quadrant_id = 'cd8',
divider_refs = ['cd4-div', 'cd8-div'],
divider_ranges = [(None, 1700), (2000, None)]
)
quad_4 = fk.gates.Quadrant(
quadrant_id = 'cd4/8_double_neg',
divider_refs = ['cd4-div', 'cd8-div'],
divider_ranges = [(None, 1700), (None, 2000)]
)
quadrants = [quad_1, quad_2, quad_3, quad_4]
After all that work we can finally define the QuadrantGate object and add it to our Session object, proving the lists of Dividers and Quadrants we just defined.
quad_gate1 = fk.gates.QuadrantGate(
'cd4_cd8_quad',
dividers = quad_divs,
quadrants = quadrants
)
session.add_gate(quad_gate1, gate_path = ('root', 'scatter', 'singlets', 'cd3'))
print(session.get_gate_hierarchy())
session.analyze_samples()
gating_results = session.get_gating_results(case)
gating_results.report
root
╰── scatter
╰── singlets
╰── cd3
╰── cd4_cd8_quad
├── cd4/8_double_pos
├── cd4
├── cd8
╰── cd4/8_double_neg
| sample | gate_path | gate_name | gate_type | quadrant_parent | parent | count | absolute_percent | relative_percent | level | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OpA_I2_C2_IM1_WB1_R1.fcs | (root,) | scatter | RectangleGate | None | root | 127122 | 31.78050 | 31.780500 | 1 |
| 1 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 115728 | 28.93200 | 91.036957 | 2 |
| 2 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 61324 | 15.33100 | 52.989769 | 3 |
| 4 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter, singlets, cd3) | cd4 | QuadrantGate | cd4_cd8_quad | cd3 | 39580 | 9.89500 | 64.542430 | 4 |
| 6 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter, singlets, cd3) | cd4/8_double_neg | QuadrantGate | cd4_cd8_quad | cd3 | 741 | 0.18525 | 1.208336 | 4 |
| 3 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter, singlets, cd3) | cd4/8_double_pos | QuadrantGate | cd4_cd8_quad | cd3 | 384 | 0.09600 | 0.626182 | 4 |
| 5 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter, singlets, cd3) | cd8 | QuadrantGate | cd4_cd8_quad | cd3 | 20619 | 5.15475 | 33.623051 | 4 |
We can plot our quadrants as before, but note that we must interact with the named QuadrantGate, not each quadrant individually.
p = session.plot_gate(
case,
gate_name = 'cd4_cd8_quad'
)
show(p)
Extracting population statistics¶
Now that we’ve finished gating our data, we probably want to extract some summary statistics per sample, per gate. The block below may look a little intimidating just because I haven't found an easier way of doing this, but let's go through it step by step.
First, we retrieve the compensation matrix and make a list of the gates we want statistics for.
Second, we create placeholder lists, one the same length as the number of gates and the other the same length as the number of samples.
Third, we iterate over each gate, within each sample, extracting a DataFrame of the gate events, adding columns to indicate the sample and gate, and appending it to the frame list. Once we've iterated over each gate in a sample we concatenate them into a single table for that sample that gets appended onto the frames list. Once the process has been repeated for each sample, we have a list of DataFrames that is once again concatenated into a single DataFrame.
Finally, the DataFrame we end up with has two rows of columns, which can be convenient for display but will cause us problems later when we join this table with another one. So we remove the second set of column labels using droplevel(1, axis = 1) (where the 1 indicates the second row and axis = 1 indicates we want to drop a column label, not a row label).
comp = session.get_comp_matrices()[0]
gates = ['scatter', 'singlets', 'cd3', 'cd4', 'cd8']
frame = [0] * len(gates)
frames = [0] * len(session.get_sample_ids())
for name_ind, name in enumerate(session.get_sample_ids()):
for gate_ind, gate in enumerate(gates):
frame[gate_ind] = session.get_gate_events(
name,
gate_name = gate,
matrix = comp,
transform = biex_xform
)
frame[gate_ind]['sample'] = name
frame[gate_ind]['gate_name'] = gate
frames[name_ind] = pd.concat(frame)
frames = pd.concat(frames)
frames = frames.droplevel(1, axis = 1)
frames
| pnn | FSC-A | FSC-H | SSC-A | FITC-A | PE-A | PerCP-Cy5-5-A | PE-Cy7-A | APC-A | APC-Cy7-A | Time | sample | gate_name |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 116219.039062 | 94956.0 | 60836.460938 | 2693.074882 | 1568.251127 | 388.405031 | 3182.470663 | 1576.743492 | 3262.084927 | 0.002000 | OpA_I2_C2_IM1_WB1_R1.fcs | scatter |
| 4 | 85385.437500 | 69691.0 | 46193.339844 | 2522.430369 | 2010.843012 | 743.455975 | 3365.240966 | 1779.159772 | 3146.113409 | 0.005000 | OpA_I2_C2_IM1_WB1_R1.fcs | scatter |
| 5 | 93750.718750 | 76884.0 | 36037.441406 | 647.642239 | 843.651185 | 292.877092 | 1031.396914 | 807.299111 | 2569.393741 | 0.006000 | OpA_I2_C2_IM1_WB1_R1.fcs | scatter |
| 6 | 69603.523438 | 57286.0 | 28271.621094 | 2665.484255 | 2890.146957 | 510.446184 | 3462.074423 | 1523.258805 | 3094.958703 | 0.008000 | OpA_I2_C2_IM1_WB1_R1.fcs | scatter |
| 7 | 69679.679688 | 59490.0 | 25382.960938 | 2450.164382 | 2544.513800 | 1167.203579 | 3379.043537 | 2024.134048 | 3104.151284 | 0.008000 | OpA_I2_C2_IM1_WB1_R1.fcs | scatter |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 399778 | 98247.523438 | 80400.0 | 46125.441406 | 754.832138 | 1631.039471 | 3185.086271 | 3339.554956 | 2004.359906 | 2953.744799 | 163.190002 | OpA_I2_C2_IM1_WB3_R3.fcs | cd8 |
| 399840 | 85462.718750 | 69455.0 | 43126.203125 | 1139.117260 | 1714.930117 | 3096.326423 | 3433.256091 | 1532.504700 | 2881.940483 | 163.210007 | OpA_I2_C2_IM1_WB3_R3.fcs | cd8 |
| 399857 | 88887.679688 | 73463.0 | 45411.519531 | 720.210632 | 1466.005584 | 3019.473351 | 3533.625819 | 2349.002653 | 2862.171702 | 163.216003 | OpA_I2_C2_IM1_WB3_R3.fcs | cd8 |
| 399953 | 72181.757812 | 57811.0 | 45337.800781 | 932.333396 | 1913.562101 | 3142.703219 | 3571.499703 | 1532.719127 | 2954.807660 | 163.255005 | OpA_I2_C2_IM1_WB3_R3.fcs | cd8 |
| 399983 | 79176.164062 | 63691.0 | 41063.980469 | 408.620567 | 2177.116483 | 3048.663364 | 3436.104832 | 0.000000 | 2971.021394 | 163.265991 | OpA_I2_C2_IM1_WB3_R3.fcs | cd8 |
3473947 rows × 12 columns
Phew! That was a lot of work! That gives us a nice DataFrame of single cell data of the gates of interest, where every event is annotated by its sample of orgin and population. What we'd like to do is calculate the median intensity for each marker separately for each population and sample. We can do this by first grouping the DataFrame by sample and gate_name, then using the agg() method to calculate the median of each marker.
medians = frames.groupby(['sample', 'gate_name']).agg(
{'FITC-A' : 'median',
'PE-A' : 'median',
'PerCP-Cy5-5-A' : 'median',
'PE-Cy7-A' : 'median',
'APC-A' : 'median',
'APC-Cy7-A' : 'median'}
)
medians
| pnn | FITC-A | PE-A | PerCP-Cy5-5-A | PE-Cy7-A | APC-A | APC-Cy7-A | |
|---|---|---|---|---|---|---|---|
| sample | gate_name | ||||||
| OpA_I2_C2_IM1_WB1_R1.fcs | cd3 | 2448.656418 | 2556.682626 | 1189.141019 | 3494.773710 | 1623.562180 | 3138.078764 |
| cd4 | 2504.224808 | 2640.803679 | 897.839425 | 3549.533868 | 1376.106169 | 3120.934437 | |
| cd8 | 928.206770 | 2167.162883 | 3135.428674 | 3415.849628 | 2340.662568 | 3176.701838 | |
| scatter | 1044.081758 | 2083.788766 | 1045.632647 | 3290.920241 | 2368.278801 | 3103.907538 | |
| singlets | 1037.251193 | 2157.618375 | 1077.148351 | 3312.846526 | 2436.006097 | 3105.897302 | |
| OpA_I2_C2_IM1_WB1_R2.fcs | cd3 | 2432.733272 | 2534.581248 | 1176.837506 | 3481.712588 | 1621.440822 | 3121.853209 |
| cd4 | 2488.699773 | 2625.121875 | 889.202780 | 3536.892817 | 1349.841359 | 3105.410951 | |
| cd8 | 940.987379 | 2092.636582 | 3109.487259 | 3401.402924 | 2356.932904 | 3158.606084 | |
| scatter | 1047.680557 | 2039.360009 | 1022.652481 | 3261.204134 | 2403.305023 | 3085.378309 | |
| singlets | 1040.099008 | 2116.236021 | 1056.676042 | 3285.735868 | 2477.167214 | 3087.681993 | |
| OpA_I2_C2_IM1_WB1_R3.fcs | cd3 | 2439.976137 | 2534.515412 | 1184.755494 | 3486.000428 | 1602.343445 | 3122.160008 |
| cd4 | 2496.429268 | 2626.753967 | 901.244845 | 3541.241251 | 1343.525952 | 3105.819597 | |
| cd8 | 932.733258 | 2122.145915 | 3123.510714 | 3406.154205 | 2308.175072 | 3159.893206 | |
| scatter | 1043.411092 | 2054.967578 | 1029.501463 | 3265.015720 | 2387.901735 | 3086.562506 | |
| singlets | 1036.006345 | 2133.128594 | 1062.286350 | 3291.753223 | 2454.058692 | 3088.411071 | |
| OpA_I2_C2_IM1_WB2_R1.fcs | cd3 | 2446.816191 | 2010.260980 | 959.769319 | 3571.557686 | 1158.809997 | 3171.344827 |
| cd4 | 2494.134986 | 2152.444117 | 688.367792 | 3598.911094 | 915.013242 | 3157.265908 | |
| cd8 | 927.506400 | 1667.676912 | 3103.854743 | 3522.644201 | 1755.949658 | 3212.207612 | |
| scatter | 1216.943779 | 1608.284567 | 950.229950 | 3477.090929 | 1685.449479 | 3133.937228 | |
| singlets | 1230.260450 | 1645.794738 | 979.548801 | 3490.060648 | 1734.659169 | 3134.715931 | |
| OpA_I2_C2_IM1_WB2_R2.fcs | cd3 | 2443.791408 | 2006.128292 | 966.909870 | 3570.432531 | 1089.668715 | 3168.107434 |
| cd4 | 2492.706541 | 2146.880026 | 689.011519 | 3597.886694 | 797.814300 | 3154.348270 | |
| cd8 | 926.134100 | 1673.836071 | 3108.752529 | 3522.098159 | 1738.492336 | 3206.085622 | |
| scatter | 1208.053982 | 1607.822970 | 956.748600 | 3475.217260 | 1651.215458 | 3129.957869 | |
| singlets | 1215.924986 | 1640.725396 | 984.305327 | 3488.475905 | 1702.610529 | 3131.053175 | |
| OpA_I2_C2_IM1_WB2_R3.fcs | cd3 | 2444.967229 | 2003.301148 | 968.547454 | 3570.279317 | 1068.120685 | 3163.345876 |
| cd4 | 2492.989829 | 2141.858292 | 695.833584 | 3596.225356 | 795.862377 | 3149.560576 | |
| cd8 | 931.038728 | 1670.993442 | 3106.944464 | 3521.126293 | 1720.975158 | 3202.239944 | |
| scatter | 1222.102893 | 1599.360908 | 961.231073 | 3476.549015 | 1630.831888 | 3125.688064 | |
| singlets | 1232.560075 | 1630.424770 | 989.636735 | 3488.491998 | 1689.012435 | 3126.179116 | |
| OpA_I2_C2_IM1_WB3_R1.fcs | cd3 | 2289.397332 | 2170.490828 | 1074.431139 | 3579.790835 | 1230.680056 | 3159.628587 |
| cd4 | 2350.065987 | 2225.420510 | 755.465517 | 3607.874985 | 804.935709 | 3155.022823 | |
| cd8 | 931.196247 | 2104.577307 | 3139.040108 | 3530.864005 | 2357.675594 | 3178.013547 | |
| scatter | 1072.792507 | 2015.653490 | 820.781538 | 3398.765742 | 2608.899327 | 3110.566650 | |
| singlets | 1059.101832 | 2081.829667 | 819.942678 | 3401.894328 | 2631.303365 | 3108.560180 | |
| OpA_I2_C2_IM1_WB3_R2.fcs | cd3 | 2295.457451 | 2169.441544 | 1091.625568 | 3583.036350 | 1207.304119 | 3150.577366 |
| cd4 | 2357.150206 | 2227.422388 | 772.460385 | 3612.616478 | 784.224166 | 3145.866571 | |
| cd8 | 932.391321 | 2089.846029 | 3150.548929 | 3533.503489 | 2338.635397 | 3167.712477 | |
| scatter | 1077.838810 | 2002.551671 | 841.845392 | 3410.039901 | 2585.106139 | 3098.337671 | |
| singlets | 1061.298415 | 2075.553306 | 837.341563 | 3408.659173 | 2609.467939 | 3095.200655 | |
| OpA_I2_C2_IM1_WB3_R3.fcs | cd3 | 2315.877559 | 2039.122338 | 1130.076376 | 3570.438694 | 869.444754 | 3001.444447 |
| cd4 | 2376.316260 | 2163.037784 | 814.797382 | 3589.654872 | 587.174382 | 2993.992196 | |
| cd8 | 910.938606 | 1808.606282 | 3160.328457 | 3537.160325 | 1746.120827 | 3020.115676 | |
| scatter | 1070.221959 | 2018.911086 | 934.225842 | 3412.363145 | 2401.378002 | 2887.457100 | |
| singlets | 1047.282742 | 2106.394827 | 919.460220 | 3397.143329 | 2430.302715 | 2878.502707 |
Now, let's join this DataFrame with the analysis report so we get one big DatFrame that has all of the proportions and medians for each sample and population. We extract the table of proportions using the get_analysis_report() method, then use the merge() method to join it to our table of medians, matching the rows together based on their common values of sample and gate_name.
props = session.get_analysis_report()
summ = props.merge(medians, on = ['sample', 'gate_name'])
summ
| sample | gate_path | gate_name | gate_type | quadrant_parent | parent | count | absolute_percent | relative_percent | level | FITC-A | PE-A | PerCP-Cy5-5-A | PE-Cy7-A | APC-A | APC-Cy7-A | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OpA_I2_C2_IM1_WB1_R1.fcs | (root,) | scatter | RectangleGate | None | root | 127122 | 31.78050 | 31.780500 | 1 | 1044.081758 | 2083.788766 | 1045.632647 | 3290.920241 | 2368.278801 | 3103.907538 |
| 1 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 115728 | 28.93200 | 91.036957 | 2 | 1037.251193 | 2157.618375 | 1077.148351 | 3312.846526 | 2436.006097 | 3105.897302 |
| 2 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 61324 | 15.33100 | 52.989769 | 3 | 2448.656418 | 2556.682626 | 1189.141019 | 3494.773710 | 1623.562180 | 3138.078764 |
| 3 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter, singlets, cd3) | cd4 | QuadrantGate | cd4_cd8_quad | cd3 | 39580 | 9.89500 | 64.542430 | 4 | 2504.224808 | 2640.803679 | 897.839425 | 3549.533868 | 1376.106169 | 3120.934437 |
| 4 | OpA_I2_C2_IM1_WB1_R1.fcs | (root, scatter, singlets, cd3) | cd8 | QuadrantGate | cd4_cd8_quad | cd3 | 20619 | 5.15475 | 33.623051 | 4 | 928.206770 | 2167.162883 | 3135.428674 | 3415.849628 | 2340.662568 | 3176.701838 |
| 5 | OpA_I2_C2_IM1_WB1_R2.fcs | (root,) | scatter | RectangleGate | None | root | 132175 | 33.04375 | 33.043750 | 1 | 1047.680557 | 2039.360009 | 1022.652481 | 3261.204134 | 2403.305023 | 3085.378309 |
| 6 | OpA_I2_C2_IM1_WB1_R2.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 120130 | 30.03250 | 90.887082 | 2 | 1040.099008 | 2116.236021 | 1056.676042 | 3285.735868 | 2477.167214 | 3087.681993 |
| 7 | OpA_I2_C2_IM1_WB1_R2.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 62621 | 15.65525 | 52.127695 | 3 | 2432.733272 | 2534.581248 | 1176.837506 | 3481.712588 | 1621.440822 | 3121.853209 |
| 8 | OpA_I2_C2_IM1_WB1_R2.fcs | (root, scatter, singlets, cd3) | cd4 | QuadrantGate | cd4_cd8_quad | cd3 | 40585 | 10.14625 | 64.810527 | 4 | 2488.699773 | 2625.121875 | 889.202780 | 3536.892817 | 1349.841359 | 3105.410951 |
| 9 | OpA_I2_C2_IM1_WB1_R2.fcs | (root, scatter, singlets, cd3) | cd8 | QuadrantGate | cd4_cd8_quad | cd3 | 20830 | 5.20750 | 33.263602 | 4 | 940.987379 | 2092.636582 | 3109.487259 | 3401.402924 | 2356.932904 | 3158.606084 |
| 10 | OpA_I2_C2_IM1_WB1_R3.fcs | (root,) | scatter | RectangleGate | None | root | 133872 | 33.46800 | 33.468000 | 1 | 1043.411092 | 2054.967578 | 1029.501463 | 3265.015720 | 2387.901735 | 3086.562506 |
| 11 | OpA_I2_C2_IM1_WB1_R3.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 120928 | 30.23200 | 90.331063 | 2 | 1036.006345 | 2133.128594 | 1062.286350 | 3291.753223 | 2454.058692 | 3088.411071 |
| 12 | OpA_I2_C2_IM1_WB1_R3.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 62844 | 15.71100 | 51.968113 | 3 | 2439.976137 | 2534.515412 | 1184.755494 | 3486.000428 | 1602.343445 | 3122.160008 |
| 13 | OpA_I2_C2_IM1_WB1_R3.fcs | (root, scatter, singlets, cd3) | cd4 | QuadrantGate | cd4_cd8_quad | cd3 | 40783 | 10.19575 | 64.895615 | 4 | 2496.429268 | 2626.753967 | 901.244845 | 3541.241251 | 1343.525952 | 3105.819597 |
| 14 | OpA_I2_C2_IM1_WB1_R3.fcs | (root, scatter, singlets, cd3) | cd8 | QuadrantGate | cd4_cd8_quad | cd3 | 20829 | 5.20725 | 33.143976 | 4 | 932.733258 | 2122.145915 | 3123.510714 | 3406.154205 | 2308.175072 | 3159.893206 |
| 15 | OpA_I2_C2_IM1_WB2_R1.fcs | (root,) | scatter | RectangleGate | None | root | 116529 | 29.13225 | 29.132250 | 1 | 1216.943779 | 1608.284567 | 950.229950 | 3477.090929 | 1685.449479 | 3133.937228 |
| 16 | OpA_I2_C2_IM1_WB2_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 103375 | 25.84375 | 88.711823 | 2 | 1230.260450 | 1645.794738 | 979.548801 | 3490.060648 | 1734.659169 | 3134.715931 |
| 17 | OpA_I2_C2_IM1_WB2_R1.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 58972 | 14.74300 | 57.046675 | 3 | 2446.816191 | 2010.260980 | 959.769319 | 3571.557686 | 1158.809997 | 3171.344827 |
| 18 | OpA_I2_C2_IM1_WB2_R1.fcs | (root, scatter, singlets, cd3) | cd4 | QuadrantGate | cd4_cd8_quad | cd3 | 40440 | 10.11000 | 68.574917 | 4 | 2494.134986 | 2152.444117 | 688.367792 | 3598.911094 | 915.013242 | 3157.265908 |
| 19 | OpA_I2_C2_IM1_WB2_R1.fcs | (root, scatter, singlets, cd3) | cd8 | QuadrantGate | cd4_cd8_quad | cd3 | 15217 | 3.80425 | 25.803771 | 4 | 927.506400 | 1667.676912 | 3103.854743 | 3522.644201 | 1755.949658 | 3212.207612 |
| 20 | OpA_I2_C2_IM1_WB2_R2.fcs | (root,) | scatter | RectangleGate | None | root | 118876 | 29.71900 | 29.719000 | 1 | 1208.053982 | 1607.822970 | 956.748600 | 3475.217260 | 1651.215458 | 3129.957869 |
| 21 | OpA_I2_C2_IM1_WB2_R2.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 106191 | 26.54775 | 89.329217 | 2 | 1215.924986 | 1640.725396 | 984.305327 | 3488.475905 | 1702.610529 | 3131.053175 |
| 22 | OpA_I2_C2_IM1_WB2_R2.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 60170 | 15.04250 | 56.662052 | 3 | 2443.791408 | 2006.128292 | 966.909870 | 3570.432531 | 1089.668715 | 3168.107434 |
| 23 | OpA_I2_C2_IM1_WB2_R2.fcs | (root, scatter, singlets, cd3) | cd4 | QuadrantGate | cd4_cd8_quad | cd3 | 41060 | 10.26500 | 68.239987 | 4 | 2492.706541 | 2146.880026 | 689.011519 | 3597.886694 | 797.814300 | 3154.348270 |
| 24 | OpA_I2_C2_IM1_WB2_R2.fcs | (root, scatter, singlets, cd3) | cd8 | QuadrantGate | cd4_cd8_quad | cd3 | 15708 | 3.92700 | 26.106033 | 4 | 926.134100 | 1673.836071 | 3108.752529 | 3522.098159 | 1738.492336 | 3206.085622 |
| 25 | OpA_I2_C2_IM1_WB2_R3.fcs | (root,) | scatter | RectangleGate | None | root | 119713 | 29.92825 | 29.928250 | 1 | 1222.102893 | 1599.360908 | 961.231073 | 3476.549015 | 1630.831888 | 3125.688064 |
| 26 | OpA_I2_C2_IM1_WB2_R3.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 105490 | 26.37250 | 88.119085 | 2 | 1232.560075 | 1630.424770 | 989.636735 | 3488.491998 | 1689.012435 | 3126.179116 |
| 27 | OpA_I2_C2_IM1_WB2_R3.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 60300 | 15.07500 | 57.161816 | 3 | 2444.967229 | 2003.301148 | 968.547454 | 3570.279317 | 1068.120685 | 3163.345876 |
| 28 | OpA_I2_C2_IM1_WB2_R3.fcs | (root, scatter, singlets, cd3) | cd4 | QuadrantGate | cd4_cd8_quad | cd3 | 41136 | 10.28400 | 68.218905 | 4 | 2492.989829 | 2141.858292 | 695.833584 | 3596.225356 | 795.862377 | 3149.560576 |
| 29 | OpA_I2_C2_IM1_WB2_R3.fcs | (root, scatter, singlets, cd3) | cd8 | QuadrantGate | cd4_cd8_quad | cd3 | 15700 | 3.92500 | 26.036484 | 4 | 931.038728 | 1670.993442 | 3106.944464 | 3521.126293 | 1720.975158 | 3202.239944 |
| 30 | OpA_I2_C2_IM1_WB3_R1.fcs | (root,) | scatter | RectangleGate | None | root | 176388 | 44.09700 | 44.097000 | 1 | 1072.792507 | 2015.653490 | 820.781538 | 3398.765742 | 2608.899327 | 3110.566650 |
| 31 | OpA_I2_C2_IM1_WB3_R1.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 158237 | 39.55925 | 89.709617 | 2 | 1059.101832 | 2081.829667 | 819.942678 | 3401.894328 | 2631.303365 | 3108.560180 |
| 32 | OpA_I2_C2_IM1_WB3_R1.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 73853 | 18.46325 | 46.672396 | 3 | 2289.397332 | 2170.490828 | 1074.431139 | 3579.790835 | 1230.680056 | 3159.628587 |
| 33 | OpA_I2_C2_IM1_WB3_R1.fcs | (root, scatter, singlets, cd3) | cd4 | QuadrantGate | cd4_cd8_quad | cd3 | 48169 | 12.04225 | 65.222807 | 4 | 2350.065987 | 2225.420510 | 755.465517 | 3607.874985 | 804.935709 | 3155.022823 |
| 34 | OpA_I2_C2_IM1_WB3_R1.fcs | (root, scatter, singlets, cd3) | cd8 | QuadrantGate | cd4_cd8_quad | cd3 | 22991 | 5.74775 | 31.130760 | 4 | 931.196247 | 2104.577307 | 3139.040108 | 3530.864005 | 2357.675594 | 3178.013547 |
| 35 | OpA_I2_C2_IM1_WB3_R2.fcs | (root,) | scatter | RectangleGate | None | root | 173472 | 43.36800 | 43.368000 | 1 | 1077.838810 | 2002.551671 | 841.845392 | 3410.039901 | 2585.106139 | 3098.337671 |
| 36 | OpA_I2_C2_IM1_WB3_R2.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 153580 | 38.39500 | 88.533020 | 2 | 1061.298415 | 2075.553306 | 837.341563 | 3408.659173 | 2609.467939 | 3095.200655 |
| 37 | OpA_I2_C2_IM1_WB3_R2.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 70976 | 17.74400 | 46.214351 | 3 | 2295.457451 | 2169.441544 | 1091.625568 | 3583.036350 | 1207.304119 | 3150.577366 |
| 38 | OpA_I2_C2_IM1_WB3_R2.fcs | (root, scatter, singlets, cd3) | cd4 | QuadrantGate | cd4_cd8_quad | cd3 | 46113 | 11.52825 | 64.969849 | 4 | 2357.150206 | 2227.422388 | 772.460385 | 3612.616478 | 784.224166 | 3145.866571 |
| 39 | OpA_I2_C2_IM1_WB3_R2.fcs | (root, scatter, singlets, cd3) | cd8 | QuadrantGate | cd4_cd8_quad | cd3 | 22265 | 5.56625 | 31.369759 | 4 | 932.391321 | 2089.846029 | 3150.548929 | 3533.503489 | 2338.635397 | 3167.712477 |
| 40 | OpA_I2_C2_IM1_WB3_R3.fcs | (root,) | scatter | RectangleGate | None | root | 170989 | 42.74725 | 42.747250 | 1 | 1070.221959 | 2018.911086 | 934.225842 | 3412.363145 | 2401.378002 | 2887.457100 |
| 41 | OpA_I2_C2_IM1_WB3_R3.fcs | (root, scatter) | singlets | PolygonGate | None | scatter | 140998 | 35.24950 | 82.460275 | 2 | 1047.282742 | 2106.394827 | 919.460220 | 3397.143329 | 2430.302715 | 2878.502707 |
| 42 | OpA_I2_C2_IM1_WB3_R3.fcs | (root, scatter, singlets) | cd3 | EllipsoidGate | None | singlets | 39169 | 9.79225 | 27.779827 | 3 | 2315.877559 | 2039.122338 | 1130.076376 | 3570.438694 | 869.444754 | 3001.444447 |
| 43 | OpA_I2_C2_IM1_WB3_R3.fcs | (root, scatter, singlets, cd3) | cd4 | QuadrantGate | cd4_cd8_quad | cd3 | 25046 | 6.26150 | 63.943425 | 4 | 2376.316260 | 2163.037784 | 814.797382 | 3589.654872 | 587.174382 | 2993.992196 |
| 44 | OpA_I2_C2_IM1_WB3_R3.fcs | (root, scatter, singlets, cd3) | cd8 | QuadrantGate | cd4_cd8_quad | cd3 | 12854 | 3.21350 | 32.816768 | 4 | 910.938606 | 1808.606282 | 3160.328457 | 3537.160325 | 1746.120827 | 3020.115676 |
Let's save this as a .csv file so we can use it for downstream analyses later.
summ.to_csv('data/summary_statistics.csv')
Exporting gated populations as .fcs files¶
Let's say we want to write out a new .fcs file for every sample, that includes only the gated CD8 population (a more realistic scenario would be exporting .fcs files after removing debris, doublets, and dead cells ready for clustering). We can do this by extracting a DataFrame per sample of just the events of interest, converting this to a Sample object, and using the export() method (you may need to create the data/exported directory first).
for id in session.get_sample_ids():
samp = session.get_gate_events(id, 'cd8')
samp = fk.Sample(samp, id)
samp.export(
id,
source = 'raw',
include_metadata = True,
directory = 'data/exported'
)